Model Selection

Dynamic Masked Attention

# Dynamic Masked Attention

Doge 20M Chinese

The Doge model employs dynamic masked attention mechanisms for sequence transformation, with the option to use either multi-layer perceptrons or cross-domain mixture of experts for state transitions.

Large Language Model

Transformers Supports Multiple Languages

Doge 120M MoE Instruct

The Doge model employs dynamic masked attention mechanisms for sequence transformation and can use multi-layer perceptrons or cross-domain mixture of experts for state transitions.

Large Language Model

Transformers English

Doge 320M Instruct

Doge 320M Instruct is a lightweight language model based on dynamic masked attention, trained with supervised fine-tuning (SFT) and direct preference optimization (DPO), suitable for question-answering and dialogue tasks.

Large Language Model

Transformers English

Doge is a sequence transformation model that employs dynamic masked attention mechanisms, capable of state transitions using either multi-layer perceptrons or cross-domain mixture of experts.

Large Language Model

Transformers Supports Multiple Languages

Doge 160M Reason Distill

Doge 160M Reasoning Distilled Version is a lightweight language model based on dynamic masked attention mechanism and cross-domain mixture of experts, focusing on reasoning and question-answering tasks.

Large Language Model

Transformers English

Doge 160M is a small language model that employs dynamic masked attention mechanisms, trained by the SmallDoge community, and supports text generation tasks.

Large Language Model

Transformers Supports Multiple Languages

Doge 20M Instruct

Doge 20M is a small language model based on dynamic masked attention mechanism, supporting instruction following and Q&A tasks.

Large Language Model

Transformers English

Featured Recommended AI Models

AIbase

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご

© 2025AIbase